Overview

Brought to you by YData

Dataset statistics

Number of variables30
Number of observations10296
Missing cells39658
Missing cells (%)12.8%
Duplicate rows29
Duplicate rows (%)0.3%
Total size in memory2.3 MiB
Average record size in memory234.0 B

Variable types

Numeric15
Text3
Categorical10
Boolean2

Alerts

description_exists has constant value "True"Constant
Dataset has 29 (0.3%) duplicate rowsDuplicates
carvana_ad is highly overall correlated with condition and 12 other fieldsHigh correlation
condition is highly overall correlated with carvana_adHigh correlation
drive is highly overall correlated with typeHigh correlation
manufacturer is highly overall correlated with org_manufHigh correlation
odometer is highly overall correlated with carvana_ad and 2 other fieldsHigh correlation
org_manuf is highly overall correlated with manufacturerHigh correlation
price is highly overall correlated with odometer and 1 other fieldsHigh correlation
tfidf_auto is highly overall correlated with carvana_adHigh correlation
tfidf_car is highly overall correlated with carvana_adHigh correlation
tfidf_credit is highly overall correlated with carvana_adHigh correlation
tfidf_miles is highly overall correlated with carvana_adHigh correlation
tfidf_new is highly overall correlated with carvana_adHigh correlation
tfidf_power is highly overall correlated with carvana_adHigh correlation
tfidf_rear is highly overall correlated with carvana_adHigh correlation
tfidf_text is highly overall correlated with carvana_adHigh correlation
tfidf_truck is highly overall correlated with carvana_adHigh correlation
tfidf_vehicle is highly overall correlated with carvana_adHigh correlation
transmission is highly overall correlated with carvana_adHigh correlation
type is highly overall correlated with driveHigh correlation
year is highly overall correlated with odometer and 1 other fieldsHigh correlation
title_status is highly imbalanced (90.3%)Imbalance
fuel is highly imbalanced (61.8%)Imbalance
tfidf_auto has 1895 (18.4%) missing valuesMissing
model has 110 (1.1%) missing valuesMissing
tfidf_miles has 1895 (18.4%) missing valuesMissing
condition has 3861 (37.5%) missing valuesMissing
tfidf_power has 1895 (18.4%) missing valuesMissing
tfidf_new has 1895 (18.4%) missing valuesMissing
title_status has 174 (1.7%) missing valuesMissing
tfidf_vehicle has 1895 (18.4%) missing valuesMissing
lat has 111 (1.1%) missing valuesMissing
type has 2118 (20.6%) missing valuesMissing
tfidf_text has 1895 (18.4%) missing valuesMissing
org_manuf has 397 (3.9%) missing valuesMissing
tfidf_truck has 1895 (18.4%) missing valuesMissing
tfidf_credit has 1895 (18.4%) missing valuesMissing
cylinders has 4248 (41.3%) missing valuesMissing
drive has 3076 (29.9%) missing valuesMissing
tfidf_rear has 1895 (18.4%) missing valuesMissing
paint_color has 2867 (27.8%) missing valuesMissing
long has 111 (1.1%) missing valuesMissing
manufacturer has 3505 (34.0%) missing valuesMissing
tfidf_car has 1895 (18.4%) missing valuesMissing
tfidf_auto has 5296 (51.4%) zerosZeros
tfidf_miles has 4015 (39.0%) zerosZeros
tfidf_power has 4568 (44.4%) zerosZeros
tfidf_new has 5151 (50.0%) zerosZeros
tfidf_vehicle has 3702 (36.0%) zerosZeros
tfidf_text has 4807 (46.7%) zerosZeros
tfidf_truck has 6573 (63.8%) zerosZeros
tfidf_credit has 5168 (50.2%) zerosZeros
tfidf_rear has 5567 (54.1%) zerosZeros
tfidf_car has 5227 (50.8%) zerosZeros

Reproduction

Analysis started2024-10-24 17:28:48.343554
Analysis finished2024-10-24 17:29:39.978120
Duration51.63 seconds
Software versionydata-profiling vv4.10.0
Download configurationconfig.json

Variables

tfidf_auto
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct3024
Distinct (%)36.0%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.042166903
Minimum0
Maximum0.66149077
Zeros5296
Zeros (%)51.4%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:40.101564image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.061440289
95-th percentile0.20988407
Maximum0.66149077
Range0.66149077
Interquartile range (IQR)0.061440289

Descriptive statistics

Standard deviation0.074599757
Coefficient of variation (CV)1.7691543
Kurtosis5.4051015
Mean0.042166903
Median Absolute Deviation (MAD)0
Skewness2.1797535
Sum354.24415
Variance0.0055651237
MonotonicityNot monotonic
2024-10-24T12:29:40.268122image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5296
51.4%
0.0154114841 3
 
< 0.1%
0.04542635118 3
 
< 0.1%
0.2085570748 3
 
< 0.1%
0.0937732982 3
 
< 0.1%
0.1105257677 3
 
< 0.1%
0.1494165855 3
 
< 0.1%
0.1080258248 3
 
< 0.1%
0.1974122447 2
 
< 0.1%
0.1012856125 2
 
< 0.1%
Other values (3014) 3080
29.9%
(Missing) 1895
 
18.4%
ValueCountFrequency (%)
0 5296
51.4%
0.001298957247 1
 
< 0.1%
0.005423575805 2
 
< 0.1%
0.006002047607 1
 
< 0.1%
0.00618822919 1
 
< 0.1%
0.006866204083 2
 
< 0.1%
0.006943066934 2
 
< 0.1%
0.007040687998 1
 
< 0.1%
0.007072917092 1
 
< 0.1%
0.007148471259 1
 
< 0.1%
ValueCountFrequency (%)
0.6614907717 1
< 0.1%
0.5402635855 1
< 0.1%
0.527838005 1
< 0.1%
0.5143600559 1
< 0.1%
0.5037958612 1
< 0.1%
0.503396986 1
< 0.1%
0.481138777 1
< 0.1%
0.4803877725 1
< 0.1%
0.4794363711 1
< 0.1%
0.4793805561 1
< 0.1%

price
Real number (ℝ)

HIGH CORRELATION 

Distinct2178
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19925.813
Minimum2100
Maximum120000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:40.437629image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum2100
5-th percentile3750
Q18250
median16950
Q328727.5
95-th percentile44995
Maximum120000
Range117900
Interquartile range (IQR)20477.5

Descriptive statistics

Standard deviation14243.454
Coefficient of variation (CV)0.7148242
Kurtosis2.4970546
Mean19925.813
Median Absolute Deviation (MAD)9550.5
Skewness1.2570359
Sum2.0515617 × 108
Variance2.0287597 × 108
MonotonicityNot monotonic
2024-10-24T12:29:40.616187image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6995 93
 
0.9%
8995 92
 
0.9%
6500 88
 
0.9%
29990 84
 
0.8%
26990 79
 
0.8%
25990 76
 
0.7%
7500 76
 
0.7%
3500 74
 
0.7%
4500 74
 
0.7%
8500 71
 
0.7%
Other values (2168) 9489
92.2%
ValueCountFrequency (%)
2100 11
0.1%
2195 1
 
< 0.1%
2200 14
0.1%
2250 5
 
< 0.1%
2299 2
 
< 0.1%
2300 9
0.1%
2335 1
 
< 0.1%
2350 3
 
< 0.1%
2388 1
 
< 0.1%
2400 13
0.1%
ValueCountFrequency (%)
120000 1
 
< 0.1%
111111 1
 
< 0.1%
109999 1
 
< 0.1%
106999 1
 
< 0.1%
105000 2
< 0.1%
100000 1
 
< 0.1%
97995 1
 
< 0.1%
95900 1
 
< 0.1%
95000 3
< 0.1%
94995 1
 
< 0.1%

model
Text

MISSING 

Distinct3502
Distinct (%)34.4%
Missing110
Missing (%)1.1%
Memory size160.9 KiB
2024-10-24T12:29:41.745135image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length178
Median length161
Mean length12.359611
Min length1

Characters and Unicode

Total characters125895
Distinct characters79
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2247 ?
Unique (%)22.1%

Sample

1st row3 series
2nd rowranger supercrew lariat
3rd rowromeo stelvio ti sport
4th rows320
5th rowcorolla
ValueCountFrequency (%)
sport 597
 
2.6%
4d 583
 
2.6%
1500 562
 
2.5%
sedan 468
 
2.1%
cab 430
 
1.9%
silverado 404
 
1.8%
f-150 257
 
1.1%
grand 228
 
1.0%
super 226
 
1.0%
4x4 225
 
1.0%
Other values (1938) 18828
82.5%
2024-10-24T12:29:43.563272image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
12624
 
10.0%
e 9912
 
7.9%
r 9108
 
7.2%
a 9041
 
7.2%
s 7025
 
5.6%
t 6547
 
5.2%
i 5938
 
4.7%
o 5725
 
4.5%
l 5227
 
4.2%
c 5192
 
4.1%
Other values (69) 49556
39.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 125895
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
12624
 
10.0%
e 9912
 
7.9%
r 9108
 
7.2%
a 9041
 
7.2%
s 7025
 
5.6%
t 6547
 
5.2%
i 5938
 
4.7%
o 5725
 
4.5%
l 5227
 
4.2%
c 5192
 
4.1%
Other values (69) 49556
39.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 125895
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
12624
 
10.0%
e 9912
 
7.9%
r 9108
 
7.2%
a 9041
 
7.2%
s 7025
 
5.6%
t 6547
 
5.2%
i 5938
 
4.7%
o 5725
 
4.5%
l 5227
 
4.2%
c 5192
 
4.1%
Other values (69) 49556
39.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 125895
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
12624
 
10.0%
e 9912
 
7.9%
r 9108
 
7.2%
a 9041
 
7.2%
s 7025
 
5.6%
t 6547
 
5.2%
i 5938
 
4.7%
o 5725
 
4.5%
l 5227
 
4.2%
c 5192
 
4.1%
Other values (69) 49556
39.4%

tfidf_miles
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct4279
Distinct (%)50.9%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.044819933
Minimum0
Maximum0.64007722
Zeros4015
Zeros (%)39.0%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:43.918357image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.0096411751
Q30.068726804
95-th percentile0.18419527
Maximum0.64007722
Range0.64007722
Interquartile range (IQR)0.068726804

Descriptive statistics

Standard deviation0.068943072
Coefficient of variation (CV)1.5382234
Kurtosis6.1365082
Mean0.044819933
Median Absolute Deviation (MAD)0.0096411751
Skewness2.1398578
Sum376.53226
Variance0.0047531472
MonotonicityNot monotonic
2024-10-24T12:29:44.329221image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4015
39.0%
0.02557244391 3
 
< 0.1%
0.04671041119 3
 
< 0.1%
0.02298754203 3
 
< 0.1%
0.04360660947 3
 
< 0.1%
0.02796926203 3
 
< 0.1%
0.02572036067 3
 
< 0.1%
0.02600440516 3
 
< 0.1%
0.07558347187 3
 
< 0.1%
0.02643624384 3
 
< 0.1%
Other values (4269) 4359
42.3%
(Missing) 1895
18.4%
ValueCountFrequency (%)
0 4015
39.0%
0.001080646981 1
 
< 0.1%
0.004472179671 1
 
< 0.1%
0.004487536387 1
 
< 0.1%
0.004489532814 1
 
< 0.1%
0.004491282989 1
 
< 0.1%
0.004493155209 1
 
< 0.1%
0.004495259415 1
 
< 0.1%
0.004498209133 1
 
< 0.1%
0.004512058295 2
 
< 0.1%
ValueCountFrequency (%)
0.6400772192 1
< 0.1%
0.5541132506 1
< 0.1%
0.5510313243 1
< 0.1%
0.5437392713 1
< 0.1%
0.5261941696 1
< 0.1%
0.4997333429 1
< 0.1%
0.4973328746 1
< 0.1%
0.4854199203 1
< 0.1%
0.47155349 1
< 0.1%
0.4697453658 1
< 0.1%

condition
Categorical

HIGH CORRELATION  MISSING 

Distinct5
Distinct (%)0.1%
Missing3861
Missing (%)37.5%
Memory size160.9 KiB
-1
3363 
1
2433 
2
514 
-2
 
113
-3
 
12

Length

Max length2
Median length2
Mean length1.5420357
Min length1

Characters and Unicode

Total characters9923
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row-1
2nd row-1
3rd row1
4th row-1
5th row1

Common Values

ValueCountFrequency (%)
-1 3363
32.7%
1 2433
23.6%
2 514
 
5.0%
-2 113
 
1.1%
-3 12
 
0.1%
(Missing) 3861
37.5%

Length

2024-10-24T12:29:44.511767image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-24T12:29:44.665128image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
1 5796
90.1%
2 627
 
9.7%
3 12
 
0.2%

Most occurring characters

ValueCountFrequency (%)
1 5796
58.4%
- 3488
35.2%
2 627
 
6.3%
3 12
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 9923
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 5796
58.4%
- 3488
35.2%
2 627
 
6.3%
3 12
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 9923
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 5796
58.4%
- 3488
35.2%
2 627
 
6.3%
3 12
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 9923
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 5796
58.4%
- 3488
35.2%
2 627
 
6.3%
3 12
 
0.1%

state
Text

Distinct51
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:44.884590image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters20592
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowva
2nd rowmi
3rd rowks
4th rowaz
5th rowca
ValueCountFrequency (%)
ca 1199
 
11.6%
fl 715
 
6.9%
tx 573
 
5.6%
ny 458
 
4.4%
mi 441
 
4.3%
oh 439
 
4.3%
or 355
 
3.4%
pa 321
 
3.1%
co 318
 
3.1%
nc 312
 
3.0%
Other values (41) 5165
50.2%
2024-10-24T12:29:45.334389image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 3208
15.6%
c 2187
10.6%
n 1926
 
9.4%
i 1644
 
8.0%
m 1447
 
7.0%
o 1355
 
6.6%
t 1236
 
6.0%
l 1196
 
5.8%
f 715
 
3.5%
w 635
 
3.1%
Other values (14) 5043
24.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 20592
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 3208
15.6%
c 2187
10.6%
n 1926
 
9.4%
i 1644
 
8.0%
m 1447
 
7.0%
o 1355
 
6.6%
t 1236
 
6.0%
l 1196
 
5.8%
f 715
 
3.5%
w 635
 
3.1%
Other values (14) 5043
24.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 20592
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 3208
15.6%
c 2187
10.6%
n 1926
 
9.4%
i 1644
 
8.0%
m 1447
 
7.0%
o 1355
 
6.6%
t 1236
 
6.0%
l 1196
 
5.8%
f 715
 
3.5%
w 635
 
3.1%
Other values (14) 5043
24.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 20592
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 3208
15.6%
c 2187
10.6%
n 1926
 
9.4%
i 1644
 
8.0%
m 1447
 
7.0%
o 1355
 
6.6%
t 1236
 
6.0%
l 1196
 
5.8%
f 715
 
3.5%
w 635
 
3.1%
Other values (14) 5043
24.5%

tfidf_power
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct3693
Distinct (%)44.0%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.05847928
Minimum0
Maximum0.59413495
Zeros4568
Zeros (%)44.4%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:45.817132image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.099639715
95-th percentile0.24595528
Maximum0.59413495
Range0.59413495
Interquartile range (IQR)0.099639715

Descriptive statistics

Standard deviation0.090608517
Coefficient of variation (CV)1.5494123
Kurtosis3.7034253
Mean0.05847928
Median Absolute Deviation (MAD)0
Skewness1.8698024
Sum491.28443
Variance0.0082099034
MonotonicityNot monotonic
2024-10-24T12:29:46.007591image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4568
44.4%
0.02997977763 6
 
0.1%
0.05505375296 4
 
< 0.1%
0.1228374255 4
 
< 0.1%
0.0155545393 3
 
< 0.1%
0.1923413504 3
 
< 0.1%
0.01612965963 3
 
< 0.1%
0.02744034279 3
 
< 0.1%
0.1322471159 3
 
< 0.1%
0.1475943979 3
 
< 0.1%
Other values (3683) 3801
36.9%
(Missing) 1895
18.4%
ValueCountFrequency (%)
0 4568
44.4%
0.004500882071 1
 
< 0.1%
0.004544849028 1
 
< 0.1%
0.004568178518 1
 
< 0.1%
0.004578363538 1
 
< 0.1%
0.004608579438 1
 
< 0.1%
0.004612099701 1
 
< 0.1%
0.004625619689 1
 
< 0.1%
0.004643681413 1
 
< 0.1%
0.004775087211 1
 
< 0.1%
ValueCountFrequency (%)
0.5941349492 1
< 0.1%
0.5763887728 1
< 0.1%
0.5628034591 1
< 0.1%
0.559433739 1
< 0.1%
0.5517589513 1
< 0.1%
0.5368876003 1
< 0.1%
0.5368273107 1
< 0.1%
0.5313440329 1
< 0.1%
0.5287529745 1
< 0.1%
0.5123783217 1
< 0.1%

tfidf_new
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct3148
Distinct (%)37.5%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.049130921
Minimum0
Maximum0.91233722
Zeros5151
Zeros (%)50.0%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:46.193090image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.04811926
95-th percentile0.25414037
Maximum0.91233722
Range0.91233722
Interquartile range (IQR)0.04811926

Descriptive statistics

Standard deviation0.10372671
Coefficient of variation (CV)2.1112308
Kurtosis14.168765
Mean0.049130921
Median Absolute Deviation (MAD)0
Skewness3.3565078
Sum412.74887
Variance0.010759231
MonotonicityNot monotonic
2024-10-24T12:29:46.442279image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5151
50.0%
0.04482418259 4
 
< 0.1%
0.03013048127 3
 
< 0.1%
0.03096911277 3
 
< 0.1%
0.02995720209 3
 
< 0.1%
0.01765745591 3
 
< 0.1%
0.3539721059 3
 
< 0.1%
0.01501974171 3
 
< 0.1%
0.175638604 3
 
< 0.1%
0.01719898031 3
 
< 0.1%
Other values (3138) 3222
31.3%
(Missing) 1895
 
18.4%
ValueCountFrequency (%)
0 5151
50.0%
0.006691672989 2
 
< 0.1%
0.007903321665 1
 
< 0.1%
0.008529143579 1
 
< 0.1%
0.008817081325 1
 
< 0.1%
0.009297964737 1
 
< 0.1%
0.009390334329 1
 
< 0.1%
0.009591194456 2
 
< 0.1%
0.01005527864 1
 
< 0.1%
0.01008939618 1
 
< 0.1%
ValueCountFrequency (%)
0.912337222 1
< 0.1%
0.893985339 1
< 0.1%
0.8799837274 1
< 0.1%
0.855717183 1
< 0.1%
0.8442472452 1
< 0.1%
0.8210144641 1
< 0.1%
0.8153993827 1
< 0.1%
0.8117790523 1
< 0.1%
0.7967300018 1
< 0.1%
0.7950674542 1
< 0.1%

title_status
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)0.1%
Missing174
Missing (%)1.7%
Memory size160.9 KiB
clean
9802 
rebuilt
 
183
salvage
 
90
lien
 
30
missing
 
15

Length

Max length10
Median length5
Mean length5.0549299
Min length4

Characters and Unicode

Total characters51166
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowclean
2nd rowclean
3rd rowclean
4th rowclean
5th rowclean

Common Values

ValueCountFrequency (%)
clean 9802
95.2%
rebuilt 183
 
1.8%
salvage 90
 
0.9%
lien 30
 
0.3%
missing 15
 
0.1%
parts only 2
 
< 0.1%
(Missing) 174
 
1.7%

Length

2024-10-24T12:29:46.626751image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-24T12:29:46.764419image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
clean 9802
96.8%
rebuilt 183
 
1.8%
salvage 90
 
0.9%
lien 30
 
0.3%
missing 15
 
0.1%
parts 2
 
< 0.1%
only 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
l 10107
19.8%
e 10105
19.7%
a 9984
19.5%
n 9849
19.2%
c 9802
19.2%
i 243
 
0.5%
r 185
 
0.4%
t 185
 
0.4%
u 183
 
0.4%
b 183
 
0.4%
Other values (8) 340
 
0.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 51166
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 10107
19.8%
e 10105
19.7%
a 9984
19.5%
n 9849
19.2%
c 9802
19.2%
i 243
 
0.5%
r 185
 
0.4%
t 185
 
0.4%
u 183
 
0.4%
b 183
 
0.4%
Other values (8) 340
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 51166
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 10107
19.8%
e 10105
19.7%
a 9984
19.5%
n 9849
19.2%
c 9802
19.2%
i 243
 
0.5%
r 185
 
0.4%
t 185
 
0.4%
u 183
 
0.4%
b 183
 
0.4%
Other values (8) 340
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 51166
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 10107
19.8%
e 10105
19.7%
a 9984
19.5%
n 9849
19.2%
c 9802
19.2%
i 243
 
0.5%
r 185
 
0.4%
t 185
 
0.4%
u 183
 
0.4%
b 183
 
0.4%
Other values (8) 340
 
0.7%

tfidf_vehicle
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct4497
Distinct (%)53.5%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.05705624
Minimum0
Maximum0.78422321
Zeros3702
Zeros (%)36.0%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:46.910028image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.023303367
Q30.086514163
95-th percentile0.22003102
Maximum0.78422321
Range0.78422321
Interquartile range (IQR)0.086514163

Descriptive statistics

Standard deviation0.084877793
Coefficient of variation (CV)1.4876163
Kurtosis7.0588055
Mean0.05705624
Median Absolute Deviation (MAD)0.023303367
Skewness2.3728722
Sum479.32947
Variance0.0072042397
MonotonicityNot monotonic
2024-10-24T12:29:47.080573image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3702
36.0%
0.1592240969 6
 
0.1%
0.01812210289 4
 
< 0.1%
0.3167593354 4
 
< 0.1%
0.08072379225 3
 
< 0.1%
0.04985596021 3
 
< 0.1%
0.04172056326 3
 
< 0.1%
0.01211148687 3
 
< 0.1%
0.08849096061 3
 
< 0.1%
0.0407401709 3
 
< 0.1%
Other values (4487) 4667
45.3%
(Missing) 1895
18.4%
ValueCountFrequency (%)
0 3702
36.0%
0.00409448288 1
 
< 0.1%
0.005624156937 1
 
< 0.1%
0.005836919751 1
 
< 0.1%
0.00653643387 1
 
< 0.1%
0.006560054913 2
 
< 0.1%
0.006866828549 1
 
< 0.1%
0.007592892725 1
 
< 0.1%
0.007835780162 1
 
< 0.1%
0.008041819881 1
 
< 0.1%
ValueCountFrequency (%)
0.7842232101 1
< 0.1%
0.5128763411 1
< 0.1%
0.4927697212 1
< 0.1%
0.4927295434 1
< 0.1%
0.4925890991 1
< 0.1%
0.4825611542 1
< 0.1%
0.4822950945 1
< 0.1%
0.480478726 1
< 0.1%
0.4795482042 1
< 0.1%
0.4795089707 1
< 0.1%

lat
Real number (ℝ)

MISSING 

Distinct5063
Distinct (%)49.7%
Missing111
Missing (%)1.1%
Infinite0
Infinite (%)0.0%
Mean38.527681
Minimum19.5981
Maximum77.86064
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:47.247092image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum19.5981
5-th percentile28.3019
Q134.445746
median39.3
Q342.37853
95-th percentile47.054557
Maximum77.86064
Range58.26254
Interquartile range (IQR)7.932784

Descriptive statistics

Standard deviation5.7883825
Coefficient of variation (CV)0.15023958
Kurtosis1.3887709
Mean38.527681
Median Absolute Deviation (MAD)3.702421
Skewness0.067027135
Sum392404.43
Variance33.505372
MonotonicityNot monotonic
2024-10-24T12:29:47.440573image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33.779214 101
 
1.0%
40.468785 92
 
0.9%
43.1824 79
 
0.8%
33.7865 77
 
0.7%
27.26977 35
 
0.3%
47.1991 34
 
0.3%
46.234838 33
 
0.3%
47.696062 33
 
0.3%
47.81247 32
 
0.3%
36.17 31
 
0.3%
Other values (5053) 9638
93.6%
(Missing) 111
 
1.1%
ValueCountFrequency (%)
19.5981 1
 
< 0.1%
19.641782 1
 
< 0.1%
19.646976 1
 
< 0.1%
19.719349 1
 
< 0.1%
20.77208 1
 
< 0.1%
20.877965 1
 
< 0.1%
20.886756 3
< 0.1%
20.889768 1
 
< 0.1%
20.89258 1
 
< 0.1%
20.9174 1
 
< 0.1%
ValueCountFrequency (%)
77.86064 1
 
< 0.1%
64.93614 1
 
< 0.1%
64.8378 2
< 0.1%
64.81552 1
 
< 0.1%
64.7805 1
 
< 0.1%
64.0378 1
 
< 0.1%
61.605649 1
 
< 0.1%
61.573915 1
 
< 0.1%
61.572407 2
< 0.1%
61.56939 3
< 0.1%

transmission
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing47
Missing (%)0.5%
Memory size160.9 KiB
automatic
7819 
other
1818 
manual
 
612

Length

Max length9
Median length9
Mean length8.1113279
Min length5

Characters and Unicode

Total characters83133
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowautomatic
2nd rowother
3rd rowother
4th rowautomatic
5th rowautomatic

Common Values

ValueCountFrequency (%)
automatic 7819
75.9%
other 1818
 
17.7%
manual 612
 
5.9%
(Missing) 47
 
0.5%

Length

2024-10-24T12:29:47.616106image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-24T12:29:47.884388image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
automatic 7819
76.3%
other 1818
 
17.7%
manual 612
 
6.0%

Most occurring characters

ValueCountFrequency (%)
t 17456
21.0%
a 16862
20.3%
o 9637
11.6%
u 8431
10.1%
m 8431
10.1%
i 7819
9.4%
c 7819
9.4%
h 1818
 
2.2%
e 1818
 
2.2%
r 1818
 
2.2%
Other values (2) 1224
 
1.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 83133
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 17456
21.0%
a 16862
20.3%
o 9637
11.6%
u 8431
10.1%
m 8431
10.1%
i 7819
9.4%
c 7819
9.4%
h 1818
 
2.2%
e 1818
 
2.2%
r 1818
 
2.2%
Other values (2) 1224
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 83133
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 17456
21.0%
a 16862
20.3%
o 9637
11.6%
u 8431
10.1%
m 8431
10.1%
i 7819
9.4%
c 7819
9.4%
h 1818
 
2.2%
e 1818
 
2.2%
r 1818
 
2.2%
Other values (2) 1224
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 83133
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 17456
21.0%
a 16862
20.3%
o 9637
11.6%
u 8431
10.1%
m 8431
10.1%
i 7819
9.4%
c 7819
9.4%
h 1818
 
2.2%
e 1818
 
2.2%
r 1818
 
2.2%
Other values (2) 1224
 
1.5%

type
Categorical

HIGH CORRELATION  MISSING 

Distinct13
Distinct (%)0.2%
Missing2118
Missing (%)20.6%
Memory size160.9 KiB
sedan
2087 
SUV
1844 
pickup
1166 
truck
784 
other
536 
Other values (8)
1761 

Length

Max length11
Median length5
Mean length5.0606505
Min length3

Characters and Unicode

Total characters41386
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowconvertible
2nd rowpickup
3rd rowhatchback
4th rowSUV
5th rowconvertible

Common Values

ValueCountFrequency (%)
sedan 2087
20.3%
SUV 1844
17.9%
pickup 1166
11.3%
truck 784
 
7.6%
other 536
 
5.2%
coupe 507
 
4.9%
hatchback 442
 
4.3%
wagon 266
 
2.6%
convertible 217
 
2.1%
van 195
 
1.9%
Other values (3) 134
 
1.3%
(Missing) 2118
20.6%

Length

2024-10-24T12:29:48.178601image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sedan 2087
25.5%
suv 1844
22.5%
pickup 1166
14.3%
truck 784
 
9.6%
other 536
 
6.6%
coupe 507
 
6.2%
hatchback 442
 
5.4%
wagon 266
 
3.3%
convertible 217
 
2.7%
van 195
 
2.4%
Other values (3) 134
 
1.6%

Most occurring characters

ValueCountFrequency (%)
e 3564
 
8.6%
c 3558
 
8.6%
a 3555
 
8.6%
n 2993
 
7.2%
p 2839
 
6.9%
u 2468
 
6.0%
k 2392
 
5.8%
s 2098
 
5.1%
d 2096
 
5.1%
t 1979
 
4.8%
Other values (15) 13844
33.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 41386
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 3564
 
8.6%
c 3558
 
8.6%
a 3555
 
8.6%
n 2993
 
7.2%
p 2839
 
6.9%
u 2468
 
6.0%
k 2392
 
5.8%
s 2098
 
5.1%
d 2096
 
5.1%
t 1979
 
4.8%
Other values (15) 13844
33.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 41386
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 3564
 
8.6%
c 3558
 
8.6%
a 3555
 
8.6%
n 2993
 
7.2%
p 2839
 
6.9%
u 2468
 
6.0%
k 2392
 
5.8%
s 2098
 
5.1%
d 2096
 
5.1%
t 1979
 
4.8%
Other values (15) 13844
33.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 41386
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 3564
 
8.6%
c 3558
 
8.6%
a 3555
 
8.6%
n 2993
 
7.2%
p 2839
 
6.9%
u 2468
 
6.0%
k 2392
 
5.8%
s 2098
 
5.1%
d 2096
 
5.1%
t 1979
 
4.8%
Other values (15) 13844
33.5%

tfidf_text
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct3467
Distinct (%)41.3%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.040443381
Minimum0
Maximum1
Zeros4807
Zeros (%)46.7%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:48.671015image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.070583751
95-th percentile0.16262274
Maximum1
Range1
Interquartile range (IQR)0.070583751

Descriptive statistics

Standard deviation0.061215884
Coefficient of variation (CV)1.5136194
Kurtosis11.351614
Mean0.040443381
Median Absolute Deviation (MAD)0
Skewness2.1949981
Sum339.76484
Variance0.0037473844
MonotonicityNot monotonic
2024-10-24T12:29:48.882481image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 4807
46.7%
0.1135405116 4
 
< 0.1%
0.0633336666 4
 
< 0.1%
0.08635350668 3
 
< 0.1%
0.07838034789 3
 
< 0.1%
0.05154347547 3
 
< 0.1%
0.08026611417 3
 
< 0.1%
0.08751470076 3
 
< 0.1%
0.04232758613 3
 
< 0.1%
0.06001624309 3
 
< 0.1%
Other values (3457) 3565
34.6%
(Missing) 1895
 
18.4%
ValueCountFrequency (%)
0 4807
46.7%
0.001192460091 1
 
< 0.1%
0.002727602427 1
 
< 0.1%
0.006422821511 1
 
< 0.1%
0.006535357797 1
 
< 0.1%
0.006589449878 1
 
< 0.1%
0.00674613219 1
 
< 0.1%
0.006765997612 1
 
< 0.1%
0.007111321212 1
 
< 0.1%
0.007628716632 1
 
< 0.1%
ValueCountFrequency (%)
1 1
< 0.1%
0.6114419968 1
< 0.1%
0.4776100548 1
< 0.1%
0.4635352866 1
< 0.1%
0.4608938853 1
< 0.1%
0.4487710757 1
< 0.1%
0.4449453972 1
< 0.1%
0.4322772687 1
< 0.1%
0.4116430558 1
< 0.1%
0.4092394588 1
< 0.1%

org_manuf
Categorical

HIGH CORRELATION  MISSING 

Distinct40
Distinct (%)0.4%
Missing397
Missing (%)3.9%
Memory size160.9 KiB
ford
1784 
chevrolet
1305 
toyota
864 
honda
512 
jeep
 
430
Other values (35)
5004 

Length

Max length15
Median length12
Mean length5.7937165
Min length3

Characters and Unicode

Total characters57352
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowbmw
2nd rowford
3rd rowalfa-romeo
4th rowmercedes-benz
5th rowtoyota

Common Values

ValueCountFrequency (%)
ford 1784
17.3%
chevrolet 1305
12.7%
toyota 864
 
8.4%
honda 512
 
5.0%
jeep 430
 
4.2%
ram 422
 
4.1%
nissan 419
 
4.1%
gmc 403
 
3.9%
bmw 358
 
3.5%
dodge 294
 
2.9%
Other values (30) 3108
30.2%
(Missing) 397
 
3.9%

Length

2024-10-24T12:29:49.057018image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ford 1784
18.0%
chevrolet 1305
13.2%
toyota 864
 
8.7%
honda 512
 
5.2%
jeep 430
 
4.3%
ram 422
 
4.3%
nissan 419
 
4.2%
gmc 403
 
4.1%
bmw 358
 
3.6%
dodge 294
 
3.0%
Other values (30) 3108
31.4%

Most occurring characters

ValueCountFrequency (%)
o 6331
 
11.0%
e 5668
 
9.9%
r 4756
 
8.3%
a 4570
 
8.0%
d 3930
 
6.9%
t 3365
 
5.9%
c 2998
 
5.2%
n 2696
 
4.7%
l 2595
 
4.5%
i 2332
 
4.1%
Other values (16) 18111
31.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 57352
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 6331
 
11.0%
e 5668
 
9.9%
r 4756
 
8.3%
a 4570
 
8.0%
d 3930
 
6.9%
t 3365
 
5.9%
c 2998
 
5.2%
n 2696
 
4.7%
l 2595
 
4.5%
i 2332
 
4.1%
Other values (16) 18111
31.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 57352
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 6331
 
11.0%
e 5668
 
9.9%
r 4756
 
8.3%
a 4570
 
8.0%
d 3930
 
6.9%
t 3365
 
5.9%
c 2998
 
5.2%
n 2696
 
4.7%
l 2595
 
4.5%
i 2332
 
4.1%
Other values (16) 18111
31.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 57352
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 6331
 
11.0%
e 5668
 
9.9%
r 4756
 
8.3%
a 4570
 
8.0%
d 3930
 
6.9%
t 3365
 
5.9%
c 2998
 
5.2%
n 2696
 
4.7%
l 2595
 
4.5%
i 2332
 
4.1%
Other values (16) 18111
31.6%

year
Real number (ℝ)

HIGH CORRELATION 

Distinct92
Distinct (%)0.9%
Missing23
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean2011.1923
Minimum1900
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:49.204586image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile1998
Q12008
median2013
Q32017
95-th percentile2019.4
Maximum2021
Range121
Interquartile range (IQR)9

Descriptive statistics

Standard deviation9.5318846
Coefficient of variation (CV)0.0047394199
Kurtosis20.913912
Mean2011.1923
Median Absolute Deviation (MAD)4
Skewness-3.6897563
Sum20660978
Variance90.856825
MonotonicityNot monotonic
2024-10-24T12:29:49.373175image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2018 895
 
8.7%
2017 846
 
8.2%
2014 742
 
7.2%
2015 740
 
7.2%
2013 733
 
7.1%
2016 726
 
7.1%
2019 610
 
5.9%
2012 555
 
5.4%
2011 525
 
5.1%
2020 478
 
4.6%
Other values (82) 3423
33.2%
ValueCountFrequency (%)
1900 1
< 0.1%
1916 1
< 0.1%
1921 2
< 0.1%
1923 1
< 0.1%
1926 1
< 0.1%
1927 1
< 0.1%
1928 1
< 0.1%
1929 2
< 0.1%
1932 1
< 0.1%
1933 1
< 0.1%
ValueCountFrequency (%)
2021 36
 
0.3%
2020 478
4.6%
2019 610
5.9%
2018 895
8.7%
2017 846
8.2%
2016 726
7.1%
2015 740
7.2%
2014 742
7.2%
2013 733
7.1%
2012 555
5.4%

tfidf_truck
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct1723
Distinct (%)20.5%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.042502364
Minimum0
Maximum0.85359411
Zeros6573
Zeros (%)63.8%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:49.534741image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.27592258
Maximum0.85359411
Range0.85359411
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.11931126
Coefficient of variation (CV)2.8071675
Kurtosis16.730696
Mean0.042502364
Median Absolute Deviation (MAD)0
Skewness3.8893696
Sum357.06236
Variance0.014235176
MonotonicityNot monotonic
2024-10-24T12:29:49.705249image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 6573
63.8%
0.02115039754 6
 
0.1%
0.1181588295 3
 
< 0.1%
0.1735436594 3
 
< 0.1%
0.07947088907 3
 
< 0.1%
0.1410554583 3
 
< 0.1%
0.7894767623 3
 
< 0.1%
0.7294941727 3
 
< 0.1%
0.01930582874 3
 
< 0.1%
0.7143425475 3
 
< 0.1%
Other values (1713) 1798
 
17.5%
(Missing) 1895
 
18.4%
ValueCountFrequency (%)
0 6573
63.8%
0.006162356024 1
 
< 0.1%
0.00618037779 1
 
< 0.1%
0.006404506296 1
 
< 0.1%
0.006412680234 1
 
< 0.1%
0.006432176253 1
 
< 0.1%
0.006445597622 1
 
< 0.1%
0.006459968457 1
 
< 0.1%
0.006471531092 1
 
< 0.1%
0.006507569428 1
 
< 0.1%
ValueCountFrequency (%)
0.8535941059 1
 
< 0.1%
0.8258080166 1
 
< 0.1%
0.797586157 1
 
< 0.1%
0.7894767623 3
< 0.1%
0.7877936592 1
 
< 0.1%
0.7852018309 1
 
< 0.1%
0.7782061692 1
 
< 0.1%
0.7771934359 2
< 0.1%
0.7761335711 2
< 0.1%
0.7737457025 1
 
< 0.1%

tfidf_credit
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct3093
Distinct (%)36.8%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.045251695
Minimum0
Maximum0.70253508
Zeros5168
Zeros (%)50.2%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:49.929687image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.069527976
95-th percentile0.21285
Maximum0.70253508
Range0.70253508
Interquartile range (IQR)0.069527976

Descriptive statistics

Standard deviation0.079613984
Coefficient of variation (CV)1.7593592
Kurtosis7.6256345
Mean0.045251695
Median Absolute Deviation (MAD)0
Skewness2.4290397
Sum380.15949
Variance0.0063383864
MonotonicityNot monotonic
2024-10-24T12:29:50.134156image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5168
50.2%
0.1641678138 6
 
0.1%
0.03014716913 4
 
< 0.1%
0.2690607229 4
 
< 0.1%
0.1317855871 3
 
< 0.1%
0.04585700264 3
 
< 0.1%
0.05083579423 3
 
< 0.1%
0.0327790753 3
 
< 0.1%
0.3014343525 3
 
< 0.1%
0.1948055302 3
 
< 0.1%
Other values (3083) 3201
31.1%
(Missing) 1895
 
18.4%
ValueCountFrequency (%)
0 5168
50.2%
0.002896921654 1
 
< 0.1%
0.004415684614 1
 
< 0.1%
0.007646757924 1
 
< 0.1%
0.008482220306 1
 
< 0.1%
0.009528079568 1
 
< 0.1%
0.009969638014 1
 
< 0.1%
0.01002152241 1
 
< 0.1%
0.01075773294 1
 
< 0.1%
0.0111376256 1
 
< 0.1%
ValueCountFrequency (%)
0.7025350754 1
< 0.1%
0.6954358808 2
< 0.1%
0.6537140918 1
< 0.1%
0.6009980876 1
< 0.1%
0.5992340727 1
< 0.1%
0.5950281433 1
< 0.1%
0.5356420447 1
< 0.1%
0.5351164781 1
< 0.1%
0.5225999558 1
< 0.1%
0.5122655147 1
< 0.1%

carvana_ad
Boolean

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size90.5 KiB
False
8401 
True
1895 
ValueCountFrequency (%)
False 8401
81.6%
True 1895
 
18.4%
2024-10-24T12:29:50.282994image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

cylinders
Categorical

MISSING 

Distinct8
Distinct (%)0.1%
Missing4248
Missing (%)41.3%
Memory size160.9 KiB
6 cylinders
2296 
4 cylinders
1842 
8 cylinders
1773 
5 cylinders
 
45
10 cylinders
 
40
Other values (3)
 
52

Length

Max length12
Median length11
Mean length10.974041
Min length5

Characters and Unicode

Total characters66371
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6 cylinders
2nd row8 cylinders
3rd row8 cylinders
4th row4 cylinders
5th row4 cylinders

Common Values

ValueCountFrequency (%)
6 cylinders 2296
22.3%
4 cylinders 1842
17.9%
8 cylinders 1773
17.2%
5 cylinders 45
 
0.4%
10 cylinders 40
 
0.4%
other 34
 
0.3%
3 cylinders 11
 
0.1%
12 cylinders 7
 
0.1%
(Missing) 4248
41.3%

Length

2024-10-24T12:29:50.415639image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-24T12:29:50.717830image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
cylinders 6014
49.9%
6 2296
 
19.0%
4 1842
 
15.3%
8 1773
 
14.7%
5 45
 
0.4%
10 40
 
0.3%
other 34
 
0.3%
3 11
 
0.1%
12 7
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 6048
9.1%
r 6048
9.1%
c 6014
9.1%
y 6014
9.1%
6014
9.1%
d 6014
9.1%
l 6014
9.1%
n 6014
9.1%
i 6014
9.1%
s 6014
9.1%
Other values (11) 6163
9.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 66371
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 6048
9.1%
r 6048
9.1%
c 6014
9.1%
y 6014
9.1%
6014
9.1%
d 6014
9.1%
l 6014
9.1%
n 6014
9.1%
i 6014
9.1%
s 6014
9.1%
Other values (11) 6163
9.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 66371
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 6048
9.1%
r 6048
9.1%
c 6014
9.1%
y 6014
9.1%
6014
9.1%
d 6014
9.1%
l 6014
9.1%
n 6014
9.1%
i 6014
9.1%
s 6014
9.1%
Other values (11) 6163
9.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 66371
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 6048
9.1%
r 6048
9.1%
c 6014
9.1%
y 6014
9.1%
6014
9.1%
d 6014
9.1%
l 6014
9.1%
n 6014
9.1%
i 6014
9.1%
s 6014
9.1%
Other values (11) 6163
9.3%

drive
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)< 0.1%
Missing3076
Missing (%)29.9%
Memory size160.9 KiB
4wd
3175 
fwd
2494 
rwd
1551 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters21660
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowrwd
2nd rowfwd
3rd row4wd
4th rowrwd
5th rowfwd

Common Values

ValueCountFrequency (%)
4wd 3175
30.8%
fwd 2494
24.2%
rwd 1551
15.1%
(Missing) 3076
29.9%

Length

2024-10-24T12:29:50.881428image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-24T12:29:51.001074image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
4wd 3175
44.0%
fwd 2494
34.5%
rwd 1551
21.5%

Most occurring characters

ValueCountFrequency (%)
w 7220
33.3%
d 7220
33.3%
4 3175
14.7%
f 2494
 
11.5%
r 1551
 
7.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 21660
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
w 7220
33.3%
d 7220
33.3%
4 3175
14.7%
f 2494
 
11.5%
r 1551
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 21660
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
w 7220
33.3%
d 7220
33.3%
4 3175
14.7%
f 2494
 
11.5%
r 1551
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 21660
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
w 7220
33.3%
d 7220
33.3%
4 3175
14.7%
f 2494
 
11.5%
r 1551
 
7.2%

tfidf_rear
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct2727
Distinct (%)32.5%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.04352736
Minimum0
Maximum0.63290076
Zeros5567
Zeros (%)54.1%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:51.156667image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.051563904
95-th percentile0.24335664
Maximum0.63290076
Range0.63290076
Interquartile range (IQR)0.051563904

Descriptive statistics

Standard deviation0.086448024
Coefficient of variation (CV)1.9860617
Kurtosis6.5669413
Mean0.04352736
Median Absolute Deviation (MAD)0
Skewness2.5146377
Sum365.67335
Variance0.0074732608
MonotonicityNot monotonic
2024-10-24T12:29:51.337209image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5567
54.1%
0.01726722651 6
 
0.1%
0.1886659891 4
 
< 0.1%
0.06341779012 4
 
< 0.1%
0.1575751949 3
 
< 0.1%
0.02833627755 3
 
< 0.1%
0.06340993549 3
 
< 0.1%
0.07218915027 3
 
< 0.1%
0.01576131688 3
 
< 0.1%
0.01767244463 3
 
< 0.1%
Other values (2717) 2802
27.2%
(Missing) 1895
 
18.4%
ValueCountFrequency (%)
0 5567
54.1%
0.005045672688 1
 
< 0.1%
0.005184678231 1
 
< 0.1%
0.005235324864 1
 
< 0.1%
0.005251241453 1
 
< 0.1%
0.005262198685 1
 
< 0.1%
0.005273931063 1
 
< 0.1%
0.005283370821 1
 
< 0.1%
0.005308737512 1
 
< 0.1%
0.00531279259 1
 
< 0.1%
ValueCountFrequency (%)
0.6329007625 1
< 0.1%
0.5939449776 1
< 0.1%
0.5895443466 1
< 0.1%
0.5637942842 1
< 0.1%
0.5424831746 1
< 0.1%
0.5301955365 1
< 0.1%
0.5243134798 2
< 0.1%
0.516264286 1
< 0.1%
0.5146175047 1
< 0.1%
0.5010833253 1
< 0.1%

odometer
Real number (ℝ)

HIGH CORRELATION 

Distinct7258
Distinct (%)70.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean89672.649
Minimum0
Maximum299200
Zeros21
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:51.498784image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7134
Q136386
median83813.5
Q3133000
95-th percentile200000
Maximum299200
Range299200
Interquartile range (IQR)96614

Descriptive statistics

Standard deviation61158.144
Coefficient of variation (CV)0.68201558
Kurtosis-0.33728644
Mean89672.649
Median Absolute Deviation (MAD)48102
Skewness0.540888
Sum9.2326959 × 108
Variance3.7403185 × 109
MonotonicityNot monotonic
2024-10-24T12:29:51.762948image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 43
 
0.4%
100000 37
 
0.4%
200000 36
 
0.3%
140000 35
 
0.3%
150000 35
 
0.3%
160000 32
 
0.3%
170000 29
 
0.3%
130000 27
 
0.3%
180000 25
 
0.2%
120000 24
 
0.2%
Other values (7248) 9973
96.9%
ValueCountFrequency (%)
0 21
0.2%
1 43
0.4%
2 4
 
< 0.1%
3 4
 
< 0.1%
4 1
 
< 0.1%
5 3
 
< 0.1%
7 1
 
< 0.1%
9 1
 
< 0.1%
10 3
 
< 0.1%
13 1
 
< 0.1%
ValueCountFrequency (%)
299200 1
< 0.1%
298813 1
< 0.1%
298000 1
< 0.1%
297600 1
< 0.1%
297000 1
< 0.1%
296062 1
< 0.1%
296000 1
< 0.1%
295000 2
< 0.1%
293000 1
< 0.1%
292300 1
< 0.1%

paint_color
Categorical

MISSING 

Distinct12
Distinct (%)0.2%
Missing2867
Missing (%)27.8%
Memory size160.9 KiB
white
1950 
black
1605 
silver
1109 
red
797 
blue
780 
Other values (7)
1188 

Length

Max length6
Median length5
Mean length4.7931081
Min length3

Characters and Unicode

Total characters35608
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowwhite
2nd rowblack
3rd rowwhite
4th rowsilver
5th rowred

Common Values

ValueCountFrequency (%)
white 1950
18.9%
black 1605
15.6%
silver 1109
 
10.8%
red 797
 
7.7%
blue 780
 
7.6%
grey 565
 
5.5%
green 184
 
1.8%
custom 171
 
1.7%
brown 146
 
1.4%
orange 53
 
0.5%
Other values (2) 69
 
0.7%
(Missing) 2867
27.8%

Length

2024-10-24T12:29:51.922524image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
white 1950
26.2%
black 1605
21.6%
silver 1109
14.9%
red 797
10.7%
blue 780
 
10.5%
grey 565
 
7.6%
green 184
 
2.5%
custom 171
 
2.3%
brown 146
 
2.0%
orange 53
 
0.7%
Other values (2) 69
 
0.9%

Most occurring characters

ValueCountFrequency (%)
e 5691
16.0%
l 3616
10.2%
i 3059
 
8.6%
r 2870
 
8.1%
b 2531
 
7.1%
w 2149
 
6.0%
t 2121
 
6.0%
h 1950
 
5.5%
c 1776
 
5.0%
a 1658
 
4.7%
Other values (11) 8187
23.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 35608
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 5691
16.0%
l 3616
10.2%
i 3059
 
8.6%
r 2870
 
8.1%
b 2531
 
7.1%
w 2149
 
6.0%
t 2121
 
6.0%
h 1950
 
5.5%
c 1776
 
5.0%
a 1658
 
4.7%
Other values (11) 8187
23.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 35608
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 5691
16.0%
l 3616
10.2%
i 3059
 
8.6%
r 2870
 
8.1%
b 2531
 
7.1%
w 2149
 
6.0%
t 2121
 
6.0%
h 1950
 
5.5%
c 1776
 
5.0%
a 1658
 
4.7%
Other values (11) 8187
23.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 35608
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 5691
16.0%
l 3616
10.2%
i 3059
 
8.6%
r 2870
 
8.1%
b 2531
 
7.1%
w 2149
 
6.0%
t 2121
 
6.0%
h 1950
 
5.5%
c 1776
 
5.0%
a 1658
 
4.7%
Other values (11) 8187
23.0%

fuel
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing60
Missing (%)0.6%
Memory size160.9 KiB
gas
8558 
other
 
789
diesel
 
699
hybrid
 
143
electric
 
47

Length

Max length8
Median length3
Mean length3.4238961
Min length3

Characters and Unicode

Total characters35047
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgas
2nd rowother
3rd rowother
4th rowdiesel
5th rowgas

Common Values

ValueCountFrequency (%)
gas 8558
83.1%
other 789
 
7.7%
diesel 699
 
6.8%
hybrid 143
 
1.4%
electric 47
 
0.5%
(Missing) 60
 
0.6%

Length

2024-10-24T12:29:52.068135image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-24T12:29:52.254780image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
gas 8558
83.6%
other 789
 
7.7%
diesel 699
 
6.8%
hybrid 143
 
1.4%
electric 47
 
0.5%

Most occurring characters

ValueCountFrequency (%)
s 9257
26.4%
g 8558
24.4%
a 8558
24.4%
e 2281
 
6.5%
r 979
 
2.8%
h 932
 
2.7%
i 889
 
2.5%
d 842
 
2.4%
t 836
 
2.4%
o 789
 
2.3%
Other values (4) 1126
 
3.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 35047
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 9257
26.4%
g 8558
24.4%
a 8558
24.4%
e 2281
 
6.5%
r 979
 
2.8%
h 932
 
2.7%
i 889
 
2.5%
d 842
 
2.4%
t 836
 
2.4%
o 789
 
2.3%
Other values (4) 1126
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 35047
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 9257
26.4%
g 8558
24.4%
a 8558
24.4%
e 2281
 
6.5%
r 979
 
2.8%
h 932
 
2.7%
i 889
 
2.5%
d 842
 
2.4%
t 836
 
2.4%
o 789
 
2.3%
Other values (4) 1126
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 35047
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 9257
26.4%
g 8558
24.4%
a 8558
24.4%
e 2281
 
6.5%
r 979
 
2.8%
h 932
 
2.7%
i 889
 
2.5%
d 842
 
2.4%
t 836
 
2.4%
o 789
 
2.3%
Other values (4) 1126
 
3.2%

long
Real number (ℝ)

MISSING 

Distinct5075
Distinct (%)49.8%
Missing111
Missing (%)1.1%
Infinite0
Infinite (%)0.0%
Mean-94.217343
Minimum-159.38468
Maximum167.62991
Zeros0
Zeros (%)0.0%
Negative10183
Negative (%)98.9%
Memory size160.9 KiB
2024-10-24T12:29:52.456114image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum-159.38468
5-th percentile-122.44918
Q1-110.96
median-87.924
Q3-80.8419
95-th percentile-72.999092
Maximum167.62991
Range327.01459
Interquartile range (IQR)30.1181

Descriptive statistics

Standard deviation18.184987
Coefficient of variation (CV)-0.19301103
Kurtosis5.1282479
Mean-94.217343
Median Absolute Deviation (MAD)10.0517
Skewness-0.37995077
Sum-959603.63
Variance330.69374
MonotonicityNot monotonic
2024-10-24T12:29:52.613659image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-84.411811 101
 
1.0%
-74.281707 92
 
0.9%
-84.1122 79
 
0.8%
-84.4454 77
 
0.7%
-82.48229 35
 
0.3%
-122.3151 34
 
0.3%
-119.128015 33
 
0.3%
-116.781406 33
 
0.3%
-122.32164 32
 
0.3%
-117.236949 25
 
0.2%
Other values (5065) 9644
93.7%
(Missing) 111
 
1.1%
ValueCountFrequency (%)
-159.384676 1
< 0.1%
-159.3448 1
< 0.1%
-158.030906 2
< 0.1%
-158.02241 1
< 0.1%
-158.0124 2
< 0.1%
-158.00528 1
< 0.1%
-157.9269 1
< 0.1%
-157.903554 1
< 0.1%
-157.902016 1
< 0.1%
-157.900292 2
< 0.1%
ValueCountFrequency (%)
167.629911 1
< 0.1%
94.1632 1
< 0.1%
-67.84049 1
< 0.1%
-68.7778 1
< 0.1%
-68.805028 2
< 0.1%
-68.856 1
< 0.1%
-69.462948 1
< 0.1%
-69.6826 1
< 0.1%
-70.112543 2
< 0.1%
-70.169672 2
< 0.1%

manufacturer
Categorical

HIGH CORRELATION  MISSING 

Distinct10
Distinct (%)0.1%
Missing3505
Missing (%)34.0%
Memory size160.9 KiB
ford
1784 
chevrolet
1305 
toyota
864 
honda
512 
jeep
430 
Other values (5)
1896 

Length

Max length9
Median length6
Mean length5.2831689
Min length3

Characters and Unicode

Total characters35878
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowbmw
2nd rowford
3rd rowtoyota
4th rowjeep
5th rowford

Common Values

ValueCountFrequency (%)
ford 1784
17.3%
chevrolet 1305
 
12.7%
toyota 864
 
8.4%
honda 512
 
5.0%
jeep 430
 
4.2%
ram 422
 
4.1%
nissan 419
 
4.1%
gmc 403
 
3.9%
bmw 358
 
3.5%
dodge 294
 
2.9%
(Missing) 3505
34.0%

Length

2024-10-24T12:29:52.815333image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-24T12:29:52.978934image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
ford 1784
26.3%
chevrolet 1305
19.2%
toyota 864
12.7%
honda 512
 
7.5%
jeep 430
 
6.3%
ram 422
 
6.2%
nissan 419
 
6.2%
gmc 403
 
5.9%
bmw 358
 
5.3%
dodge 294
 
4.3%

Most occurring characters

ValueCountFrequency (%)
o 5623
15.7%
e 3764
10.5%
r 3511
9.8%
t 3033
 
8.5%
d 2884
 
8.0%
a 2217
 
6.2%
h 1817
 
5.1%
f 1784
 
5.0%
c 1708
 
4.8%
n 1350
 
3.8%
Other values (11) 8187
22.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 35878
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 5623
15.7%
e 3764
10.5%
r 3511
9.8%
t 3033
 
8.5%
d 2884
 
8.0%
a 2217
 
6.2%
h 1817
 
5.1%
f 1784
 
5.0%
c 1708
 
4.8%
n 1350
 
3.8%
Other values (11) 8187
22.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 35878
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 5623
15.7%
e 3764
10.5%
r 3511
9.8%
t 3033
 
8.5%
d 2884
 
8.0%
a 2217
 
6.2%
h 1817
 
5.1%
f 1784
 
5.0%
c 1708
 
4.8%
n 1350
 
3.8%
Other values (11) 8187
22.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 35878
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 5623
15.7%
e 3764
10.5%
r 3511
9.8%
t 3033
 
8.5%
d 2884
 
8.0%
a 2217
 
6.2%
h 1817
 
5.1%
f 1784
 
5.0%
c 1708
 
4.8%
n 1350
 
3.8%
Other values (11) 8187
22.8%

region
Text

Distinct392
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:53.340512image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

Max length26
Median length20
Mean length11.504856
Min length4

Characters and Unicode

Total characters118454
Distinct characters52
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)0.1%

Sample

1st rowfredericksburg
2nd rowholland
3rd rowlawrence
4th rowphoenix
5th rowsan luis obispo
ValueCountFrequency (%)
1587
 
8.7%
city 317
 
1.7%
st 216
 
1.2%
new 213
 
1.2%
bay 206
 
1.1%
san 189
 
1.0%
south 189
 
1.0%
county 170
 
0.9%
jersey 163
 
0.9%
central 163
 
0.9%
Other values (479) 14782
81.2%
2024-10-24T12:29:53.851147image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 11691
 
9.9%
e 9864
 
8.3%
o 8879
 
7.5%
n 8536
 
7.2%
7899
 
6.7%
s 7706
 
6.5%
l 7200
 
6.1%
r 7019
 
5.9%
t 7009
 
5.9%
i 6612
 
5.6%
Other values (42) 36039
30.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 118454
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 11691
 
9.9%
e 9864
 
8.3%
o 8879
 
7.5%
n 8536
 
7.2%
7899
 
6.7%
s 7706
 
6.5%
l 7200
 
6.1%
r 7019
 
5.9%
t 7009
 
5.9%
i 6612
 
5.6%
Other values (42) 36039
30.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 118454
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 11691
 
9.9%
e 9864
 
8.3%
o 8879
 
7.5%
n 8536
 
7.2%
7899
 
6.7%
s 7706
 
6.5%
l 7200
 
6.1%
r 7019
 
5.9%
t 7009
 
5.9%
i 6612
 
5.6%
Other values (42) 36039
30.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 118454
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 11691
 
9.9%
e 9864
 
8.3%
o 8879
 
7.5%
n 8536
 
7.2%
7899
 
6.7%
s 7706
 
6.5%
l 7200
 
6.1%
r 7019
 
5.9%
t 7009
 
5.9%
i 6612
 
5.6%
Other values (42) 36039
30.4%

tfidf_car
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct3077
Distinct (%)36.6%
Missing1895
Missing (%)18.4%
Infinite0
Infinite (%)0.0%
Mean0.042918373
Minimum0
Maximum0.78660514
Zeros5227
Zeros (%)50.8%
Negative0
Negative (%)0.0%
Memory size160.9 KiB
2024-10-24T12:29:54.020691image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.044508372
95-th percentile0.22920906
Maximum0.78660514
Range0.78660514
Interquartile range (IQR)0.044508372

Descriptive statistics

Standard deviation0.089204078
Coefficient of variation (CV)2.078459
Kurtosis11.247861
Mean0.042918373
Median Absolute Deviation (MAD)0
Skewness3.0882239
Sum360.55725
Variance0.0079573676
MonotonicityNot monotonic
2024-10-24T12:29:54.201211image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5227
50.8%
0.0601891201 4
 
< 0.1%
0.1119129965 4
 
< 0.1%
0.1246509489 3
 
< 0.1%
0.02836665501 3
 
< 0.1%
0.1445830141 3
 
< 0.1%
0.04499993186 3
 
< 0.1%
0.03181529492 3
 
< 0.1%
0.2129619994 3
 
< 0.1%
0.1522412458 3
 
< 0.1%
Other values (3067) 3145
30.5%
(Missing) 1895
 
18.4%
ValueCountFrequency (%)
0 5227
50.8%
0.005014388554 1
 
< 0.1%
0.005051061027 1
 
< 0.1%
0.005076840055 1
 
< 0.1%
0.005855108942 1
 
< 0.1%
0.006321366936 1
 
< 0.1%
0.00668286744 2
 
< 0.1%
0.006757677952 2
 
< 0.1%
0.006771362061 1
 
< 0.1%
0.006809621865 1
 
< 0.1%
ValueCountFrequency (%)
0.7866051387 1
< 0.1%
0.6972666432 1
< 0.1%
0.6649944034 1
< 0.1%
0.6543392397 1
< 0.1%
0.6285982605 1
< 0.1%
0.6256799014 1
< 0.1%
0.6219419389 1
< 0.1%
0.6168228805 1
< 0.1%
0.5967558639 1
< 0.1%
0.5932451852 1
< 0.1%

description_exists
Boolean

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size90.5 KiB
True
10296 
ValueCountFrequency (%)
True 10296
100.0%
2024-10-24T12:29:54.343831image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Interactions

2024-10-24T12:29:35.023636image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:53.421049image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:57.220485image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:59.509182image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:01.516305image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.276979image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:06.108085image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:08.166363image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:11.140060image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:17.983032image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:21.196397image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.723850image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:25.887239image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:28.468246image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:32.035043image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:35.362268image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:53.816409image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:57.344192image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:59.634811image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:01.668935image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.396259image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:06.231398image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:08.357420image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:11.420309image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:18.334698image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:21.583586image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.843569image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:26.058745image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:28.609828image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:32.194854image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:35.719316image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:54.190402image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:57.463715image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:59.805536image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:01.951180image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.513501image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:06.362048image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:08.489060image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:12.048051image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:18.712673image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:21.898064image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.976184image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:26.233301image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:28.752085image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:32.335478image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:35.965289image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:54.648773image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:57.595656image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:59.949997image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:02.141670image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.631146image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:06.487300image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:08.621671image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:12.518798image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:18.912318image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:22.044260image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:24.097849image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:26.431253image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:28.883693image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:32.469120image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:36.127822image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:54.859213image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:57.738266image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.092668image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:02.310220image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.744454image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:06.630923image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:08.762297image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:12.808590image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:19.096594image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:22.167501image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:24.221134image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:26.615759image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:29.014344image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:32.619718image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:36.267447image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:54.998456image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:57.891860image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.209389image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:02.454837image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.851128image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:06.762527image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:08.875030image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:13.053142image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:19.292176image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:22.414352image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:24.340814image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:26.755383image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:29.167967image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:32.776302image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:36.448964image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:55.149015image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:58.063934image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.358282image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:02.602464image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.986802image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:06.901195image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:09.027586image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:13.527649image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:19.560457image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:22.550987image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:24.476126image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:26.916834image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:29.307589image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:32.933880image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:36.601595image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:55.349777image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:58.220553image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.482982image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:02.734071image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.108449image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:07.034802image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:09.306885image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:13.954121image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:19.726049image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:22.693644image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:24.598464image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:27.053469image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:29.449844image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:33.093451image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:36.748384image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:55.570378image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:58.394053image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.614221image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:02.913588image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.236137image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:07.162461image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:09.429549image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:14.506942image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:19.894637image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:22.821264image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:24.783934image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:27.231004image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:30.130090image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:33.265991image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:36.913608image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:55.800760image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:58.586537image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.748860image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:03.238718image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.370427image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:07.300125image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:09.581111image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:15.178970image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:20.102094image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:22.967873image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:24.910593image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:27.622946image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:30.512070image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:33.531318image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:37.048256image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:56.019210image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:58.757079image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.874561image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:03.583830image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.482128image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:07.423255image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:09.797527image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:15.649294image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:20.330223image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.086553image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:25.024323image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:27.781522image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:30.830227image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:33.696880image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:37.199888image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:56.335330image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:58.911747image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:00.999228image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:03.725417image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.606793image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:07.570862image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:10.026373image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:15.947059image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:20.517604image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.219203image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:25.152944image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:27.946117image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:31.130448image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:33.859568image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:37.337482image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:56.599631image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:59.066332image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:01.130930image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:03.882032image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.720797image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:07.719499image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:10.362989image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:17.174793image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:20.675554image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.353840image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:25.414246image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:28.080756image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:31.330914image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:33.995391image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:37.487082image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:56.827424image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:59.217929image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:01.257962image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.010655image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.844457image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:07.866121image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:10.712054image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:17.516893image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:20.839208image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.473521image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:25.563069image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:28.208811image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:31.594216image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:34.240742image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:37.708491image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:57.082002image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:28:59.369521image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:01.381629image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:04.138345image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:05.967097image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:08.007785image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:10.908680image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:17.770449image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:21.002768image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:23.589249image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:25.701701image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:28.333481image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:31.805651image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
2024-10-24T12:29:34.746380image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

2024-10-24T12:29:54.452540image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
carvana_adconditioncylindersdrivefuellatlongmanufacturerodometerorg_manufpaint_colorpricetfidf_autotfidf_cartfidf_credittfidf_milestfidf_newtfidf_powertfidf_reartfidf_texttfidf_trucktfidf_vehicletitle_statustransmissiontypeyear
carvana_ad1.0000.6170.2770.1610.4030.1870.1590.1330.5250.2770.1850.4571.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0840.8780.3600.283
condition0.6171.0000.0980.0960.1510.0610.0750.0550.2000.0880.0810.1710.0300.0680.0960.0670.0810.0420.0000.0230.1070.0590.1630.3990.1460.122
cylinders0.2770.0981.0000.3880.1890.0280.0370.2110.0670.3200.1050.1440.0000.0460.0390.0660.0000.0170.0570.0330.1280.0420.0590.1740.2480.101
drive0.1610.0960.3881.0000.1610.1580.0410.3940.1050.4580.1160.2580.0700.1160.0610.0690.0410.0680.0650.0350.1900.0850.0410.1320.5570.183
fuel0.4030.1510.1890.1611.0000.0440.0300.1830.1300.3360.0830.1930.0310.0470.0400.0280.0000.0140.0230.0340.1470.0310.0310.2660.2380.081
lat0.1870.0610.0280.1580.0441.000-0.0250.0410.0670.0460.031-0.0430.081-0.028-0.033-0.037-0.0280.0680.0890.0200.0150.1240.0000.1150.077-0.067
long0.1590.0750.0370.0410.030-0.0251.0000.0550.0120.0600.012-0.080-0.055-0.012-0.1240.0140.045-0.064-0.091-0.089-0.012-0.1250.0160.0880.041-0.019
manufacturer0.1330.0550.2110.3940.1830.0410.0551.0000.0511.0000.0950.1010.0220.0530.0320.0360.0140.0310.0370.0250.0960.0390.0310.0980.2550.062
odometer0.5250.2000.0670.1050.1300.0670.0120.0511.0000.0850.051-0.611-0.031-0.079-0.097-0.027-0.014-0.065-0.158-0.0710.021-0.1880.0160.3500.092-0.646
org_manuf0.2770.0880.3200.4580.3360.0460.0601.0000.0851.0000.1100.2390.0000.0830.0520.0490.0350.0410.0510.0480.1160.0310.0210.1930.2670.099
paint_color0.1850.0810.1050.1160.0830.0310.0120.0950.0510.1101.0000.0710.0130.0360.0320.0550.0380.0370.0190.0410.0700.0600.0140.1360.0920.089
price0.4570.1710.1440.2580.193-0.043-0.0800.101-0.6110.2390.0711.0000.072-0.0430.236-0.080-0.0050.1650.2570.1240.3040.2850.0360.3000.1470.680
tfidf_auto1.0000.0300.0000.0700.0310.081-0.0550.022-0.0310.0000.0130.0721.0000.0620.273-0.112-0.0980.1210.1150.172-0.0090.2320.0000.0450.0370.143
tfidf_car1.0000.0680.0460.1160.047-0.028-0.0120.053-0.0790.0830.036-0.0430.0621.0000.1840.0460.157-0.0630.0150.094-0.0610.1180.0790.0630.1000.004
tfidf_credit1.0000.0960.0390.0610.040-0.033-0.1240.032-0.0970.0520.0320.2360.2730.1841.000-0.096-0.0020.0210.0570.3110.0820.4140.0390.0800.0440.255
tfidf_miles1.0000.0670.0660.0690.028-0.0370.0140.036-0.0270.0490.055-0.080-0.1120.046-0.0961.0000.1720.004-0.080-0.1080.004-0.1060.0390.0520.041-0.064
tfidf_new1.0000.0810.0000.0410.000-0.0280.0450.014-0.0140.0350.038-0.005-0.0980.157-0.0020.1721.000-0.074-0.012-0.1150.092-0.0430.0170.0980.045-0.095
tfidf_power1.0000.0420.0170.0680.0140.068-0.0640.031-0.0650.0410.0370.1650.121-0.0630.0210.004-0.0741.0000.393-0.0120.0650.1540.0220.0500.0540.157
tfidf_rear1.0000.0000.0570.0650.0230.089-0.0910.037-0.1580.0510.0190.2570.1150.0150.057-0.080-0.0120.3931.0000.0540.0540.2340.0000.0310.0560.205
tfidf_text1.0000.0230.0330.0350.0340.020-0.0890.025-0.0710.0480.0410.1240.1720.0940.311-0.108-0.115-0.0120.0541.0000.0210.2300.0430.0390.0350.148
tfidf_truck1.0000.1070.1280.1900.1470.015-0.0120.0960.0210.1160.0700.304-0.009-0.0610.0820.0040.0920.0650.0540.0211.0000.0910.0080.0300.1740.041
tfidf_vehicle1.0000.0590.0420.0850.0310.124-0.1250.039-0.1880.0310.0600.2850.2320.1180.414-0.106-0.0430.1540.2340.2300.0911.0000.0000.0550.0650.307
title_status0.0840.1630.0590.0410.0310.0000.0160.0310.0160.0210.0140.0360.0000.0790.0390.0390.0170.0220.0000.0430.0080.0001.0000.0620.0290.140
transmission0.8780.3990.1740.1320.2660.1150.0880.0980.3500.1930.1360.3000.0450.0630.0800.0520.0980.0500.0310.0390.0300.0550.0621.0000.2990.277
type0.3600.1460.2480.5570.2380.0770.0410.2550.0920.2670.0920.1470.0370.1000.0440.0410.0450.0540.0560.0350.1740.0650.0290.2991.0000.103
year0.2830.1220.1010.1830.081-0.067-0.0190.062-0.6460.0990.0890.6800.1430.0040.255-0.064-0.0950.1570.2050.1480.0410.3070.1400.2770.1031.000

Missing values

2024-10-24T12:29:38.366919image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-10-24T12:29:38.960957image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-10-24T12:29:39.505929image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

tfidf_autopricemodeltfidf_milesconditionstatetfidf_powertfidf_newtitle_statustfidf_vehiclelattransmissiontypetfidf_textorg_manufyeartfidf_trucktfidf_creditcarvana_adcylindersdrivetfidf_rearodometerpaint_colorfuellongmanufacturerregiontfidf_cardescription_exists
1898460.028732129893 series0.023903Noneva0.00.056003clean0.022641NaNautomaticconvertible0.079128bmw2011.00.00.280134False6 cylindersrwd0.02946599178whitegasNaNbmwfredericksburg0.055929True
306451NaN35990ranger supercrew lariatNaN-1miNaNNaNcleanNaN42.770000otherpickupNaNford2020.0NaNNaNTrueNoneNoneNaN4250blackother-86.100000fordhollandNaNTrue
282208NaN31990romeo stelvio ti sportNaN-1ksNaNNaNcleanNaN38.960000otherhatchbackNaNalfa-romeo2018.0NaNNaNTrueNoneNoneNaN24008whiteother-95.250000NonelawrenceNaNTrue
2403960.00000034500s3200.000000Noneaz0.00.000000clean0.00000034.365500automaticNone0.000000mercedes-benz1998.00.00.000000FalseNoneNone0.000000100000Nonediesel-112.129600Nonephoenix0.000000True
1529410.2305203995corolla0.0958891ca0.00.000000clean0.00000035.122159automaticNone0.158715toyota2007.00.00.056189FalseNonefwd0.000000253148silvergas-120.626169toyotasan luis obispo0.056091True
238380.0000005000grand cherokee0.000000-1fl0.00.000000clean0.00000030.421200automaticSUV0.000000jeep2005.00.00.000000False8 cylinders4wd0.204941125000redgas-86.892600jeeppensacola0.000000True
1927600.0000006500mustang convertible0.1596491tx0.00.000000clean0.00000033.334000manualconvertible0.000000ford2002.00.00.000000False8 cylindersrwd0.000000140000blackgas-96.750000forddallas / fort worth0.000000True
1195870.0000007905versa0.000000Nonenv0.00.000000clean0.00000036.143249automatichatchback0.000000nissan2008.00.00.000000False4 cylindersfwd0.00000028954blackgas-115.226909nissanlas vegas0.000000True
2361320.0000002500tahoe0.171397Nonear0.00.000000salvage0.00000035.093700automaticNone0.189131chevrolet2004.00.00.000000FalseNoneNone0.000000187000Nonegas-91.907400chevroletlittle rock0.000000True
240980.0000008950murano0.000000-1ny0.00.000000clean0.00000041.187900automaticSUV0.000000nissan2009.00.00.000000False4 cylindersfwd0.00000094075silvergas-73.167700nissannew york city0.697267True
tfidf_autopricemodeltfidf_milesconditionstatetfidf_powertfidf_newtitle_statustfidf_vehiclelattransmissiontypetfidf_textorg_manufyeartfidf_trucktfidf_creditcarvana_adcylindersdrivetfidf_rearodometerpaint_colorfuellongmanufacturerregiontfidf_cardescription_exists
1997910.00000012514soul0.024247Nonega0.0000000.028404clean0.022967NaNautomaticwagon0.080266kia2016.00.00.284162False4 cylindersfwd0.02988889291blackgasNaNNonecolumbus0.028367True
297102NaN17990sonata se sedan 4dNaN-1scNaNNaNcleanNaN32.780000othersedanNaNhyundai2018.0NaNNaNTrueNonefwdNaN25065silvergas-79.990000NonecharlestonNaNTrue
972010.0000007990murano0.0000001ny0.0000000.041311clean0.02226940.859538automaticSUV0.064856nissan2010.00.00.110210FalseNone4wd0.000000114955Nonegas-73.075599nissanlong island0.041257True
1642020.00000069001998 Jep Wrangler0.000000-1va0.0000000.200189clean0.00000037.125600manualSUV0.000000None1998.00.00.000000False4 cylinders4wd0.000000185820redgas-76.446900Nonenorfolk / hampton roads0.000000True
1666300.00000019995gx 4700.0079261va0.0000000.018571clean0.01501636.917094automaticSUV0.043732lexus2007.00.00.055736FalseNone4wd0.00000093954whitegas-76.232240Nonenorfolk / hampton roads0.009273True
2256630.0000004500a6 3.2 quattro0.0000001ia0.3615370.158313clean0.00000041.729432automaticsedan0.074562audi2005.00.00.000000False6 cylindersfwd0.083293163600whitegas-93.604582Nonedes moines0.000000True
570620.2013577495g350.000000Noneoh0.0000000.000000clean0.05289239.355403automaticsedan0.123232infiniti2008.00.00.000000FalseNone4wd0.000000106704Nonegas-84.396202Nonecincinnati0.000000True
318902NaN29990tacoma double cab pickupNaN-1caNaNNaNcleanNaN33.779214otherpickupNaNtoyota2012.0NaNNaNTrue6 cylinders4wdNaN43182whitegas-84.411811toyotamercedNaNTrue
324054NaN12590spark ev 1lt hatchbackNaN-1caNaNNaNcleanNaN36.600000otherhatchbackNaNchevrolet2016.0NaNNaNTrueNonefwdNaN26063silverelectric-121.880000chevroletmonterey bayNaNTrue
1994500.0000003200tribute0.089603-1mi0.0000000.000000clean0.08487442.982100automaticSUV0.000000mazda2004.00.00.000000False6 cylindersrwd0.000000150000blackgas-83.734000Noneflint0.104828True

Duplicate rows

Most frequently occurring

tfidf_autopricemodeltfidf_milesconditionstatetfidf_powertfidf_newtitle_statustfidf_vehiclelattransmissiontypetfidf_textorg_manufyeartfidf_trucktfidf_creditcarvana_adcylindersdrivetfidf_rearodometerpaint_colorfuellongmanufacturerregiontfidf_cardescription_exists# duplicates
80.023998renegade0.000000NaNmt0.0550540.0clean0.31675947.798900automaticSUV0.113541jeep2019.00.0000000.030147False4 cylinders4wd0.06341813120NaNgas-116.742300jeepbozeman0.060189True3
00.06990accord0.000000NaNfl0.0000000.0clean0.14744725.869874automaticsedan0.128826honda2009.00.0587580.045608FalseNaNNaN0.000000128500greengas-80.242697hondasouth florida0.182112True2
10.07900highlander0.000000-1wi0.0762760.0clean0.13503543.120500automaticSUV0.000000toyota2009.00.0000000.000000False6 cylindersfwd0.000000190524customgas-89.352300toyotamadison0.000000True2
20.010899yukon xl0.000000NaNoh0.0000000.0clean0.08670841.418454automaticother0.101009gmc2007.00.0691060.107280FalseNaNNaN0.000000161585bluegas-81.720190gmcakron / canton0.053546True2
30.014995a40.000000NaNva0.0000000.0clean0.00000038.259970automaticsedan0.174646audi2014.00.0000000.139115FalseNaNNaN0.00000078600blackgas-77.493210NaNfredericksburg0.046291True2
40.018998mazda60.000000NaNmt0.0592770.0clean0.28858547.696062automaticsedan0.122249mazda2016.00.0000000.032460False4 cylindersfwd0.06828275890NaNgas-116.781406NaNbillings0.097209True2
50.020850transit t3500.000000NaNok0.0000000.0clean0.000000NaNautomaticother0.000000ford2015.00.3866260.000000FalseNaNNaN0.000000169031blackdieselNaNfordoklahoma city0.059915True2
60.020900silverado 1500 ltz 4x40.0495312pa0.0000000.0clean0.00000041.135956automaticpickup0.000000chevrolet2011.00.0747860.058048False8 cylindersNaN0.000000110273greygas-75.364945chevroletscranton / wilkes-barre0.057947True2
70.022777wrangler0.000000NaNnh0.0000000.0clean0.03706343.066264automaticSUV0.086354jeep2013.00.1181590.045857FalseNaN4wd0.00000097822blackgas-71.447000jeepnew hampshire0.000000True2
90.032999a80.000000NaNnj0.0000000.0clean0.00000040.920150automaticsedan0.135668audi2015.00.0000000.144090FalseNaNNaN0.00000040035NaNgas-74.193960NaNnorth jersey0.191784True2